Skip to main content

HW 4

Question 1

Explain the following terms:

  1. Cannon’s algorithm: Cannon's Algorithm is a well-known parallel matrix multiplication algorithm for distributing the computations of matrix multiplication over multiple processors in a multi-processor system. The algorithm uses a 2-dimensional block-partitioning approach to divide the large matrices into smaller sub-matrices, which are then distributed across the processors for multiplication and accumulation. The results from all processors are then combined to obtain the final product matrix.
  2. Blocking and non-blocking send/receive: In a blocking send/receive, the sender process will wait until the receiver process has received the message before continuing to execute. In a non-blocking send/receive, the sender process does not wait for the receiver process to receive the message.
  3. Single Program Multiple Data: A parallel computing approach where a single program is executed on multiple processors or cores simultaneously, each processing its own set of data.
  4. Loosely synchronous: A type of communication or coordination that is not tightly timed or controlled, but rather is based on approximate or flexible timing.

Question 2

Part 1

Deadlock is possible because both programs try to send a message to each other on line 3. With the blocking send/receive, they both will stop at line 3 waiting for the other thread to receive the message, but apparently, no one has yet to execute the receive command. I will reverse the send-and-receive order in program A to avoid the deadlock.

Part 2

No, it's not safe because program A may not receive the correct data from B when B is yet to send data to A or is in the process of copying to buffer a.

Question 3

Part 1

  1. Initialize matrices aa and bb and MPI size and rank.

  2. Initially, each pi,jp_{i,j} has ai,ja_{i,j} and bi,jb_{i,j}. Align elements ai,ja_{i,j} and bi,jb_{i,j} by reordering them so that ai,j+ia_{i,j+i} and bi+j,jb_{i+j,j} are on pi,jp_{i,j}

  3. Each pi,jp_{i,j} computes for the first round

    ci,j=ai,j+ibi+j,jc_{i,j} = a_{i,j}+i * b_{i+j,j} (ai,j+ia_{i,j+i} and bi+j,jb_{i+j,j} are local on pi,jp_{i,j})

  4. For k=1k = 1 to n1n-1:

    // Shift matrix a left by one

    MPI_send(aa, pi,j1p_{i,j-1})

    MPI_recv(aa, pi,j+1pi,j+1)

    // Shift matrix b up by one

    MPI_send(bb, pi1,jp_{i-1,j})

    MPI_recv(bb, pi+1,jp_{i+1,j})

    c=c+abc = c + a*b

Part 2

  1. Initialize matrices aa and bb and MPI size and rank.

  2. Initially, each pi,jp_{i,j} has ai,ja_{i,j} and bi,jb_{i,j}. Align elements ai,ja_{i,j} and bi,jb_{i,j} by reordering them so that ai,j+ia_{i,j+i} and bi+j,jb_{i+j,j} are on pi,jp_{i,j}

  3. Each pi,jp_{i,j} computes for the first round

    ci,j=ai,j+ibi+j,jc_{i,j} = a_{i,j}+i * b_{i+j,j} (ai,j+ia_{i,j+i} and bi+j,jb_{i+j,j} are local on pi,jp_{i,j})

  4. For k=1k = 1 to n1n-1:

    // Send shifted a and b consecutively

    MPI_Isend(aa, pi,j1p_{i,j-1})

    MPI_Isend(bb, pi1,jp_{i-1,j})

    Barrier(all sent)

    // Receive shifted a and b consecutively

    MPI_Irecv(aa, pi,j+1pi,j+1)

    MPI_Irecv(bb, pi+1,jp_{i+1,j})

    Barrier(All received)

    c=c+abc = c + a*b

Part 3

Data communicated in each iteration is 2(np)2p2=2n22(\frac{n}{p})^2 \cdot p^2 = 2n^2

We have p1p-1 iterations, so in total we have 2(p1)n22(p-1)n^2